{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "02d02c5b",
   "metadata": {
    "origin_pos": 0
   },
   "source": [
    "# Transposed Convolution\n",
    ":label:`sec_transposed_conv`\n",
    "\n",
    "The CNN layers we have seen so far,\n",
    "such as convolutional layers (:numref:`sec_conv_layer`) and pooling layers (:numref:`sec_pooling`),\n",
    "typically reduce (downsample) the spatial dimensions (height and width) of the input,\n",
    "or keep them unchanged.\n",
    "In semantic segmentation\n",
    "that classifies at pixel-level,\n",
    "it will be convenient if\n",
    "the spatial dimensions of the\n",
    "input and output are the same.\n",
    "For example,\n",
    "the channel dimension at one output pixel \n",
    "can hold the classification results\n",
    "for the input pixel at the same spatial position.\n",
    "\n",
    "\n",
    "To achieve this, especially after \n",
    "the spatial dimensions are reduced by CNN layers,\n",
    "we can use another type\n",
    "of CNN layers\n",
    "that can increase (upsample) the spatial dimensions\n",
    "of intermediate feature maps.\n",
    "In this section,\n",
    "we will introduce \n",
    "*transposed convolution*, which is also called *fractionally-strided convolution* :cite:`Dumoulin.Visin.2016`, \n",
    "for reversing downsampling operations\n",
    "by the convolution.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "64ac86cc",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-08-18T19:40:30.453582Z",
     "iopub.status.busy": "2023-08-18T19:40:30.453269Z",
     "iopub.status.idle": "2023-08-18T19:40:33.572095Z",
     "shell.execute_reply": "2023-08-18T19:40:33.570416Z"
    },
    "origin_pos": 2,
    "tab": [
     "pytorch"
    ]
   },
   "outputs": [],
   "source": [
    "import torch\n",
    "from torch import nn\n",
    "from d2l import torch as d2l"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67e53c32",
   "metadata": {
    "origin_pos": 3
   },
   "source": [
    "## Basic Operation\n",
    "\n",
    "Ignoring channels for now,\n",
    "let's begin with\n",
    "the basic transposed convolution operation\n",
    "with stride of 1 and no padding.\n",
    "Suppose that\n",
    "we are given a \n",
    "$n_h \\times n_w$ input tensor\n",
    "and a $k_h \\times k_w$ kernel.\n",
    "Sliding the kernel window with stride of 1\n",
    "for $n_w$ times in each row\n",
    "and $n_h$ times in each column\n",
    "yields \n",
    "a total of $n_h n_w$ intermediate results.\n",
    "Each intermediate result is\n",
    "a $(n_h + k_h - 1) \\times (n_w + k_w - 1)$\n",
    "tensor that are initialized as zeros.\n",
    "To compute each intermediate tensor,\n",
    "each element in the input tensor\n",
    "is multiplied by the kernel\n",
    "so that the resulting $k_h \\times k_w$ tensor\n",
    "replaces a portion in\n",
    "each intermediate tensor.\n",
    "Note that\n",
    "the position of the replaced portion in each\n",
    "intermediate tensor corresponds to the position of the element\n",
    "in the input tensor used for the computation.\n",
    "In the end, all the intermediate results\n",
    "are summed over to produce the output.\n",
    "\n",
    "As an example,\n",
    ":numref:`fig_trans_conv` illustrates\n",
    "how transposed convolution with a $2\\times 2$ kernel is computed for a $2\\times 2$ input tensor.\n",
    "\n",
    "\n",
    "![Transposed convolution with a $2\\times 2$ kernel. The shaded portions are a portion of an intermediate tensor as well as the input and kernel tensor elements used for the  computation.](../img/trans_conv.svg)\n",
    ":label:`fig_trans_conv`\n",
    "\n",
    "\n",
    "We can (**implement this basic transposed convolution operation**) `trans_conv` for a input matrix `X` and a kernel matrix `K`.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "8d01fcf9",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-08-18T19:40:33.580872Z",
     "iopub.status.busy": "2023-08-18T19:40:33.579987Z",
     "iopub.status.idle": "2023-08-18T19:40:33.594490Z",
     "shell.execute_reply": "2023-08-18T19:40:33.593361Z"
    },
    "origin_pos": 4,
    "tab": [
     "pytorch"
    ]
   },
   "outputs": [],
   "source": [
    "def trans_conv(X, K):\n",
    "    h, w = K.shape\n",
    "    Y = torch.zeros((X.shape[0] + h - 1, X.shape[1] + w - 1))\n",
    "    for i in range(X.shape[0]):\n",
    "        for j in range(X.shape[1]):\n",
    "            Y[i: i + h, j: j + w] += X[i, j] * K\n",
    "    return Y"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e33e6143",
   "metadata": {
    "origin_pos": 5
   },
   "source": [
    "In contrast to the regular convolution (in :numref:`sec_conv_layer`) that *reduces* input elements\n",
    "via the kernel,\n",
    "the transposed convolution\n",
    "*broadcasts* input elements \n",
    "via the kernel, thereby\n",
    "producing an output\n",
    "that is larger than the input.\n",
    "We can construct the input tensor `X` and the kernel tensor `K` from :numref:`fig_trans_conv` to [**validate the output of the above implementation**] of the basic two-dimensional transposed convolution operation.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "75ed0b6d",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-08-18T19:40:33.598934Z",
     "iopub.status.busy": "2023-08-18T19:40:33.598651Z",
     "iopub.status.idle": "2023-08-18T19:40:33.626958Z",
     "shell.execute_reply": "2023-08-18T19:40:33.625781Z"
    },
    "origin_pos": 6,
    "tab": [
     "pytorch"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[ 0.,  0.,  1.],\n",
       "        [ 0.,  4.,  6.],\n",
       "        [ 4., 12.,  9.]])"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X = torch.tensor([[0.0, 1.0], [2.0, 3.0]])\n",
    "K = torch.tensor([[0.0, 1.0], [2.0, 3.0]])\n",
    "trans_conv(X, K)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "060319a4",
   "metadata": {
    "origin_pos": 7
   },
   "source": [
    "Alternatively,\n",
    "when the input `X` and kernel `K` are both\n",
    "four-dimensional tensors,\n",
    "we can [**use high-level APIs to obtain the same results**].\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "e1ad37e5",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-08-18T19:40:33.632024Z",
     "iopub.status.busy": "2023-08-18T19:40:33.631545Z",
     "iopub.status.idle": "2023-08-18T19:40:33.647991Z",
     "shell.execute_reply": "2023-08-18T19:40:33.646736Z"
    },
    "origin_pos": 9,
    "tab": [
     "pytorch"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[[[ 0.,  0.,  1.],\n",
       "          [ 0.,  4.,  6.],\n",
       "          [ 4., 12.,  9.]]]], grad_fn=<ConvolutionBackward0>)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X, K = X.reshape(1, 1, 2, 2), K.reshape(1, 1, 2, 2)\n",
    "tconv = nn.ConvTranspose2d(1, 1, kernel_size=2, bias=False)\n",
    "tconv.weight.data = K\n",
    "tconv(X)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "53afbb91",
   "metadata": {
    "origin_pos": 10
   },
   "source": [
    "## [**Padding, Strides, and Multiple Channels**]\n",
    "\n",
    "Different from in the regular convolution\n",
    "where padding is applied to input,\n",
    "it is applied to output\n",
    "in the transposed convolution.\n",
    "For example,\n",
    "when specifying the padding number\n",
    "on either side of the height and width \n",
    "as 1,\n",
    "the first and last rows and columns\n",
    "will be removed from the transposed convolution output.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "1048beb2",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-08-18T19:40:33.653300Z",
     "iopub.status.busy": "2023-08-18T19:40:33.652500Z",
     "iopub.status.idle": "2023-08-18T19:40:33.662731Z",
     "shell.execute_reply": "2023-08-18T19:40:33.661823Z"
    },
    "origin_pos": 12,
    "tab": [
     "pytorch"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[[[4.]]]], grad_fn=<ConvolutionBackward0>)"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tconv = nn.ConvTranspose2d(1, 1, kernel_size=2, padding=1, bias=False)\n",
    "tconv.weight.data = K\n",
    "tconv(X)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "549fc8f2",
   "metadata": {
    "origin_pos": 13
   },
   "source": [
    "In the transposed convolution,\n",
    "strides are specified for intermediate results (thus output), not for input.\n",
    "Using the same input and kernel tensors\n",
    "from :numref:`fig_trans_conv`,\n",
    "changing the stride from 1 to 2\n",
    "increases both the height and weight\n",
    "of intermediate tensors, hence the output tensor\n",
    "in :numref:`fig_trans_conv_stride2`.\n",
    "\n",
    "\n",
    "![Transposed convolution with a $2\\times 2$ kernel with stride of 2. The shaded portions are a portion of an intermediate tensor as well as the input and kernel tensor elements used for the  computation.](../img/trans_conv_stride2.svg)\n",
    ":label:`fig_trans_conv_stride2`\n",
    "\n",
    "\n",
    "\n",
    "The following code snippet can validate the transposed convolution output for stride of 2 in :numref:`fig_trans_conv_stride2`.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "72cccf5f",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-08-18T19:40:33.667420Z",
     "iopub.status.busy": "2023-08-18T19:40:33.666693Z",
     "iopub.status.idle": "2023-08-18T19:40:33.676004Z",
     "shell.execute_reply": "2023-08-18T19:40:33.675089Z"
    },
    "origin_pos": 15,
    "tab": [
     "pytorch"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[[[0., 0., 0., 1.],\n",
       "          [0., 0., 2., 3.],\n",
       "          [0., 2., 0., 3.],\n",
       "          [4., 6., 6., 9.]]]], grad_fn=<ConvolutionBackward0>)"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tconv = nn.ConvTranspose2d(1, 1, kernel_size=2, stride=2, bias=False)\n",
    "tconv.weight.data = K\n",
    "tconv(X)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4ecbf916",
   "metadata": {
    "origin_pos": 16
   },
   "source": [
    "For multiple input and output channels,\n",
    "the transposed convolution\n",
    "works in the same way as the regular convolution.\n",
    "Suppose that\n",
    "the input has $c_i$ channels,\n",
    "and that the transposed convolution\n",
    "assigns a $k_h\\times k_w$ kernel tensor\n",
    "to each input channel.\n",
    "When multiple output channels \n",
    "are specified,\n",
    "we will have a $c_i\\times k_h\\times k_w$ kernel for each output channel.\n",
    "\n",
    "\n",
    "As in all, if we feed $\\mathsf{X}$ into a convolutional layer $f$ to output $\\mathsf{Y}=f(\\mathsf{X})$ and create a transposed convolutional layer $g$ with the same hyperparameters as $f$ except \n",
    "for the number of output channels \n",
    "being the number of channels in $\\mathsf{X}$,\n",
    "then $g(Y)$ will have the same shape as $\\mathsf{X}$.\n",
    "This can be illustrated in the following example.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "5f8aac99",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-08-18T19:40:33.679264Z",
     "iopub.status.busy": "2023-08-18T19:40:33.678924Z",
     "iopub.status.idle": "2023-08-18T19:40:33.724131Z",
     "shell.execute_reply": "2023-08-18T19:40:33.723022Z"
    },
    "origin_pos": 18,
    "tab": [
     "pytorch"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X = torch.rand(size=(1, 10, 16, 16))\n",
    "conv = nn.Conv2d(10, 20, kernel_size=5, padding=2, stride=3)\n",
    "tconv = nn.ConvTranspose2d(20, 10, kernel_size=5, padding=2, stride=3)\n",
    "tconv(conv(X)).shape == X.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "65785832",
   "metadata": {
    "origin_pos": 19
   },
   "source": [
    "## [**Connection to Matrix Transposition**]\n",
    ":label:`subsec-connection-to-mat-transposition`\n",
    "\n",
    "The transposed convolution is named after\n",
    "the matrix transposition.\n",
    "To explain,\n",
    "let's first\n",
    "see how to implement convolutions\n",
    "using matrix multiplications.\n",
    "In the example below, we define a $3\\times 3$ input `X` and a $2\\times 2$ convolution kernel `K`, and then use the `corr2d` function to compute the convolution output `Y`.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "54c1abd6",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-08-18T19:40:33.727935Z",
     "iopub.status.busy": "2023-08-18T19:40:33.727252Z",
     "iopub.status.idle": "2023-08-18T19:40:33.735227Z",
     "shell.execute_reply": "2023-08-18T19:40:33.734426Z"
    },
    "origin_pos": 20,
    "tab": [
     "pytorch"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[27., 37.],\n",
       "        [57., 67.]])"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X = torch.arange(9.0).reshape(3, 3)\n",
    "K = torch.tensor([[1.0, 2.0], [3.0, 4.0]])\n",
    "Y = d2l.corr2d(X, K)\n",
    "Y"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "383ed5e7",
   "metadata": {
    "origin_pos": 21
   },
   "source": [
    "Next, we rewrite the convolution kernel `K` as\n",
    "a sparse weight matrix `W`\n",
    "containing a lot of zeros. \n",
    "The shape of the weight matrix is ($4$, $9$),\n",
    "where the non-zero elements come from\n",
    "the convolution kernel `K`.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "5c83e2aa",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-08-18T19:40:33.738704Z",
     "iopub.status.busy": "2023-08-18T19:40:33.738150Z",
     "iopub.status.idle": "2023-08-18T19:40:33.746690Z",
     "shell.execute_reply": "2023-08-18T19:40:33.745684Z"
    },
    "origin_pos": 22,
    "tab": [
     "pytorch"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[1., 2., 0., 3., 4., 0., 0., 0., 0.],\n",
       "        [0., 1., 2., 0., 3., 4., 0., 0., 0.],\n",
       "        [0., 0., 0., 1., 2., 0., 3., 4., 0.],\n",
       "        [0., 0., 0., 0., 1., 2., 0., 3., 4.]])"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def kernel2matrix(K):\n",
    "    k, W = torch.zeros(5), torch.zeros((4, 9))\n",
    "    k[:2], k[3:5] = K[0, :], K[1, :]\n",
    "    W[0, :5], W[1, 1:6], W[2, 3:8], W[3, 4:] = k, k, k, k\n",
    "    return W\n",
    "\n",
    "W = kernel2matrix(K)\n",
    "W"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "125346be",
   "metadata": {
    "origin_pos": 23
   },
   "source": [
    "Concatenate the input `X` row by row to get a vector of length 9. Then the matrix multiplication of `W` and the vectorized `X` gives a vector of length 4.\n",
    "After reshaping it, we can obtain the same result `Y`\n",
    "from the original convolution operation above:\n",
    "we just implemented convolutions using matrix multiplications.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "444dbc7d",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-08-18T19:40:33.750344Z",
     "iopub.status.busy": "2023-08-18T19:40:33.749752Z",
     "iopub.status.idle": "2023-08-18T19:40:33.757265Z",
     "shell.execute_reply": "2023-08-18T19:40:33.756389Z"
    },
    "origin_pos": 24,
    "tab": [
     "pytorch"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[True, True],\n",
       "        [True, True]])"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Y == torch.matmul(W, X.reshape(-1)).reshape(2, 2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f470126e",
   "metadata": {
    "origin_pos": 25
   },
   "source": [
    "Likewise, we can implement transposed convolutions using\n",
    "matrix multiplications.\n",
    "In the following example,\n",
    "we take the $2 \\times 2$ output `Y` from the above\n",
    "regular convolution\n",
    "as input to the transposed convolution.\n",
    "To implement this operation by multiplying matrices,\n",
    "we only need to transpose the weight matrix `W`\n",
    "with the new shape $(9, 4)$.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "e1834374",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-08-18T19:40:33.761069Z",
     "iopub.status.busy": "2023-08-18T19:40:33.760455Z",
     "iopub.status.idle": "2023-08-18T19:40:33.767618Z",
     "shell.execute_reply": "2023-08-18T19:40:33.766760Z"
    },
    "origin_pos": 26,
    "tab": [
     "pytorch"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[True, True, True],\n",
       "        [True, True, True],\n",
       "        [True, True, True]])"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Z = trans_conv(Y, K)\n",
    "Z == torch.matmul(W.T, Y.reshape(-1)).reshape(3, 3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "08875952",
   "metadata": {
    "origin_pos": 27
   },
   "source": [
    "Consider implementing the convolution\n",
    "by multiplying matrices.\n",
    "Given an input vector $\\mathbf{x}$\n",
    "and a weight matrix $\\mathbf{W}$,\n",
    "the forward propagation function of the convolution\n",
    "can be implemented\n",
    "by multiplying its input with the weight matrix\n",
    "and outputting a vector \n",
    "$\\mathbf{y}=\\mathbf{W}\\mathbf{x}$.\n",
    "Since backpropagation\n",
    "follows the chain rule\n",
    "and $\\nabla_{\\mathbf{x}}\\mathbf{y}=\\mathbf{W}^\\top$,\n",
    "the backpropagation function of the convolution\n",
    "can be implemented\n",
    "by multiplying its input with the \n",
    "transposed weight matrix $\\mathbf{W}^\\top$.\n",
    "Therefore, \n",
    "the transposed convolutional layer\n",
    "can just exchange the forward propagation function\n",
    "and the backpropagation function of the convolutional layer:\n",
    "its forward propagation \n",
    "and backpropagation functions\n",
    "multiply their input vector with \n",
    "$\\mathbf{W}^\\top$ and $\\mathbf{W}$, respectively.\n",
    "\n",
    "\n",
    "## Summary\n",
    "\n",
    "* In contrast to the regular convolution that reduces input elements via the kernel, the transposed convolution broadcasts input elements via the kernel, thereby producing an output that is larger than the input.\n",
    "* If we feed $\\mathsf{X}$ into a convolutional layer $f$ to output $\\mathsf{Y}=f(\\mathsf{X})$ and create a transposed convolutional layer $g$ with the same hyperparameters as $f$ except for the number of output channels being the number of channels in $\\mathsf{X}$, then $g(Y)$ will have the same shape as $\\mathsf{X}$.\n",
    "* We can implement convolutions using matrix multiplications. The transposed convolutional layer can just exchange the forward propagation function and the backpropagation function of the convolutional layer.\n",
    "\n",
    "\n",
    "## Exercises\n",
    "\n",
    "1. In :numref:`subsec-connection-to-mat-transposition`, the convolution input `X` and the transposed convolution output `Z` have the same shape. Do they have the same value? Why?\n",
    "1. Is it efficient to use matrix multiplications to implement convolutions? Why?\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "07bcc93d",
   "metadata": {
    "origin_pos": 29,
    "tab": [
     "pytorch"
    ]
   },
   "source": [
    "[Discussions](https://discuss.d2l.ai/t/1450)\n"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  },
  "required_libs": []
 },
 "nbformat": 4,
 "nbformat_minor": 5
}